fix: harden NAR hash validation and improve URL parsing [backport #842]#858
fix: harden NAR hash validation and improve URL parsing [backport #842]#858kalbasit merged 1 commit intorelease-0.8from
Conversation
This change strengthens the validation of NAR hashes by enforcing a strict Nix32 format (52 characters) and optional narinfo hash prefix. The ParseURL function has been rewritten to be more robust, separating the hash and compression components more reliably and validating the hash before further processing. Key improvements: - Defined NormalizedHashPattern and HashPattern for precise hash validation. - Enhanced ParseURL to correctly handle various NAR URL formats including those with query parameters. - Improved the Normalize method in the URL struct to better handle narinfo hash prefixes and sanitize the hash against path traversal. - Updated all relevant tests to use valid Nix32 hashes and removed redundant test cases. - Ensured consistent URL construction in JoinURL by correctly handling paths and query parameters. This hardening prevents potential security issues related to invalid or malicious NAR URLs and ensures consistent behavior across the application. (cherry picked from commit 2bb8649)
Summary of ChangesHello @kalbasit, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the reliability and security of NAR hash validation and URL parsing within the system. By enforcing a strict Nix32 hash format and refining the URL parsing logic, it prevents potential vulnerabilities and ensures consistent handling of NAR resources. Additionally, it improves the testability and robustness of S3 storage interactions by allowing custom HTTP transports and adding extensive error path testing. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Related Documentation 1 document(s) may need updating based on files changed in this PR: kalbasit's Space S3-Compatible Storage BackendView Suggested Changes@@ -1,9 +1,7 @@
## S3-Compatible Storage Backend: Design & Implementation
-
The S3-compatible storage backend for ncps provides a scalable, cloud-native alternative to local filesystem storage. It is implemented in the `pkg/storage/s3` package and is a drop-in replacement for local storage, supporting AWS S3, MinIO, and other S3-compatible services. The backend implements the same storage interface, allowing seamless integration and migration between storage types. [Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/pkg/storage/s3/README.md#L3-L183)
### Configuration Options
-
Configuration is managed via CLI flags, environment variables, or configuration files. The core options are:
| Option | Env Variable | Required | Description |
@@ -18,7 +16,6 @@
The endpoint should be provided **without** the URL scheme in code (e.g., `"localhost:9000"`), but CLI flags may include the scheme (`http://` or `https://`). The scheme takes precedence over the `use-ssl` flag. [Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/README.md#L46-L782)
#### Example: AWS S3
-
```bash
ncps serve \
--cache-hostname=ncps.example.com \
@@ -33,7 +30,6 @@
```
#### Example: MinIO
-
```bash
ncps serve \
--cache-hostname=ncps.example.com \
@@ -48,7 +44,6 @@
```
### Supported Providers
-
The backend supports AWS S3, MinIO, and any S3-compatible service (such as Ceph). Provider-specific notes:
- **AWS S3**: Requires region and SSL; endpoint is typically `s3.<region>.amazonaws.com`.
@@ -56,7 +51,6 @@
- **Other S3-Compatible**: Use the appropriate endpoint and credentials for your provider.
### Integration with the Storage Interface
-
The S3 backend implements the `storage.Store` interface, providing methods for CRUD operations on secret keys, narinfo files, and nar objects. This allows it to be used interchangeably with the local storage backend. Initialization is performed via:
```go
@@ -81,7 +75,6 @@
[Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/pkg/storage/s3/example.go#L16-L81)
### Path Sharding
-
To prevent performance bottlenecks from too many files in a single directory, the backend shards objects using the first one and two characters of the object's hash. For example, a narinfo file with hash `abc123` is stored at `store/narinfo/a/ab/abc123.narinfo`. This is implemented via helper functions:
```go
@@ -94,7 +87,6 @@
[Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/pkg/helper/filepath.go#L5-L38)
#### Storage Structure
-
```mermaid
graph TD
A["bucket"]
@@ -107,11 +99,9 @@
```
### Error Handling
-
The backend translates S3-specific errors to project-level errors. For example, S3 `NoSuchKey` errors become `storage.ErrNotFound`, and attempts to overwrite existing objects return `storage.ErrAlreadyExists`. Configuration is validated before initialization, and bucket existence is checked. All operations include OpenTelemetry tracing for observability. [Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/pkg/storage/s3/s3.go#L1-L554)
### Migration from Local Storage
-
To migrate from local storage to S3:
1. Create an S3 bucket or MinIO instance.
@@ -139,7 +129,6 @@
### Usage Examples
#### AWS S3 (Go)
-
```go
cfg := s3.Config{
Bucket: "my-nix-cache",
@@ -153,7 +142,6 @@
```
#### MinIO (Go)
-
```go
cfg := s3.Config{
Bucket: "my-nix-cache",
@@ -166,7 +154,6 @@
```
#### Store Operations
-
```go
// Store a secret key
secretKey, err := signature.LoadSecretKey("your-secret-key-content")
@@ -181,7 +168,6 @@
[Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/pkg/storage/s3/example.go#L16-L81)
### Development Setup with MinIO
-
To develop and test with MinIO:
1. Start MinIO locally (the dev script manages this for you):
@@ -198,8 +184,51 @@
The script auto-restarts on code changes and configures MinIO for local development. The S3 backend will use the MinIO instance at `localhost:9000` with default credentials (`minioadmin`/`minioadmin`). [Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/README.md#L46-L782)
+### Kubernetes Development Environment (Kind)
+You can also develop and test with MinIO running inside a local Kubernetes cluster using [Kind](https://kind.sigs.k8s.io/) and the provided setup script:
+
+1. Start the Kubernetes development environment:
+
+ ```bash
+ ./dev-scripts/k8s-cluster.sh
+ ```
+
+ This script will:
+ - Create a Kind cluster (if not already running)
+ - Deploy MinIO with a pre-created bucket (`ncps-bucket`) and access keys
+ - Deploy PostgreSQL, MariaDB, and Redis instances
+ - Output connection details for all services
+
+2. To access MinIO from your host, port-forward the service:
+
+ ```bash
+ kubectl port-forward -n minio svc/minio 9000:9000
+ ```
+
+3. Use the following credentials and endpoint for your S3 backend configuration:
+
+ | Option | Value |
+ |--------------|-------------------|
+ | Endpoint | http://localhost:9000 |
+ | Bucket | ncps-bucket |
+ | Access Key | ncps-access-key |
+ | Secret Key | ncps-secret-key |
+ | Use SSL | false |
+
+ Example CLI usage:
+
+ ```bash
+ ncps serve \
+ --cache-storage-s3-bucket=ncps-bucket \
+ --cache-storage-s3-endpoint=http://localhost:9000 \
+ --cache-storage-s3-access-key-id=ncps-access-key \
+ --cache-storage-s3-secret-access-key=ncps-secret-key \
+ --cache-storage-s3-use-ssl=false
+ ```
+
+The script ensures MinIO is ready and pre-configured for development, so you do not need to manually create buckets or credentials. This environment is ideal for testing NCPS against S3-compatible storage and database backends in a reproducible local cluster.
+
### Security Considerations
-
Store credentials securely (prefer environment variables or IAM roles for AWS). Use SSL/TLS in production. For AWS, restrict access with IAM policies and consider temporary credentials. Enable bucket versioning and set appropriate bucket policies for access control. [Source](https://github.com/kalbasit/ncps/blob/b7653d4303670bb88fd34b8bf82ed3aade09885d/pkg/storage/s3/README.md#L3-L183)
---Note: You must be authenticated to accept/decline updates. |
There was a problem hiding this comment.
Code Review
This pull request is a great improvement. It hardens NAR hash validation, making the system more secure. The refactoring of ParseURL is a significant enhancement in terms of robustness and readability. Additionally, the introduction of a mock transport for the S3 storage backend and the extensive new tests greatly improve test coverage and code quality. I found one issue regarding the use of filepath.Base for URL parsing which could cause issues on different operating systems.
| pathPart, rawQuery, _ := strings.Cut(u, "?") | ||
|
|
||
| // Get the filename (last component of the path) | ||
| filename := filepath.Base(pathPart) |
There was a problem hiding this comment.
The filepath package is intended for operating system-specific file paths, which can lead to incorrect behavior when parsing URLs on different platforms (e.g., Windows vs. Linux). For handling URL paths, the path package should be used to ensure consistent behavior.
For example, on Windows, filepath.Base("nar/somehash.nar") would return "nar/somehash.nar", causing the parsing to fail. Using path.Base will correctly return "somehash.nar" on all systems.
Please import the path package and use path.Base here.
| filename := filepath.Base(pathPart) | |
| filename := path.Base(pathPart) |
This change strengthens the validation of NAR hashes by enforcing a strict Nix32 format (52 characters) and optional narinfo hash prefix. The ParseURL function has been rewritten to be more robust, separating the hash and compression components more reliably and validating the hash before further processing.
Key improvements:
This hardening prevents potential security issues related to invalid or malicious NAR URLs and ensures consistent behavior across the application.
(cherry picked from commit 2bb8649)